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SUBMICRON  £YSTEMS^RCHITECTURE  * 


The  Tree  Machine  Project 


An  assembly  language  has  been  defined  for  the  tree  machine  and  its  instruction  set. 
The  language  supports  both  the  notation  proposed  and  used  by  Sally  Browning  and 
the  notation  proposed  and  used  by  Marina  Chen  in  her  work  on  HARMOS.  The 
assembly  language  contains  pseudo  instructions  for  the  definition  of  the 
interconnections  in  the  logical  tree  and  the  logical  name  of  the  ports.  Macro 
definition  and  expansion  features  are  included.  These  features  are  used  in  the 
generation  of  padding  macros  for  mapping  the  logical  tree  onto  the  physical  tree. 


The  assembler  works  in  three  passes.  It  generates  the  object  code  for  each 
processor,  does  the  conversion  from  the  logical  tree  to  the  physical  tree,  and 
produces  information  required  by  the  loader.  The  loader  loads  a  precoded  bootstrap 
loader  into  each  node  of  the  tree  before  it  loads  the  machine  code  with  proper 
headers  into  the  tree.  Each  processor  in  the  tree  runs  its  bootstrap  loader  and 
stores  its  own  code  Into  Its  own  memory.  The  host  computer  initiates  the  execution 
on  completion  of  the  loading.  _ - 


Work  on  the  processor  layout  Is  underway  again,  now  that  Earl  (q.v.)  is  starting  to 
work. 


The  COPE  Machine 


The  COPE  Machine  (Class  Object  Programming  Engine)  attempts  to  exploit  the 
concurrency  available  In  programming  languages  having  constructs  similar  to  the 
class  concept  in  Simula.  The  COPE  machine  is  an  abstract  machine  in  which  objects 
can  execute  concurrently.  The  physical  structure  and  organization  of  the  hardware  is 
transparent  to  the  user. 


An  object  is  identified  by  a  unique  "handle”,  which  also  defines  the  class  of  the 
object.  Objects  contains  code  and  data.  The  code  is  shared  with  all  other  objects  in 
the  same  class,  while  the  data  is  unique  for  the  object.  Objects  communicate  and 
synchronize  via  messages.  An  object  can  send  messages  to  all  other  objects  to 
which  it  has  handles.  Messages  are  queued  by  the  receiving  object. 


A  simulator  is  currently  being  developed  for  the  evaluation  of  the  COPE  Machine 
concepts,  physical  structures  and  various  implementation  issues. 


A 
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The  Logarithm  Machine 

Chips  for  the  Logarithm  Machine  have  recently  been  returned.  The  chips  are 
currently  being  tested. 


Mapping  of  one  communication  structure  onto  another 

The  mapping  of  an  algorithm  of  a  certain  communication  structure  onto  an  array  of  a 
different  structure  can  be  divided  into  several  subclasses  of  problems.  One  such 
class  is  the  mapping  necessary  to  use  an  array  designed  for  a  particular  algorithm 
and  problem  size  for  the  same  algorithm  but  a  problem  size  larger  than  the  array  can 
accommodate  directly.  The  control  of  the  data  flow  which  typically  is  Implicit  in  the 
interconnection  scheme  for  the  basic  arrays  has  to  be  made  explicit.  We  are 
currently  studying  suitable,  control  and  data  organizations  for  orthogonal  arrays  for 
the  solution  of  band  matrix  equations.  For  the  array  and  problem  studied  a  simple 
modification  of  the  cells  and  the  control  of  the  array  makes  it  readily  useful  for 
"oversize"  problems.  There  is  however  a  need  for  a  fairly  extensive  data 
management  outside  the  array.  The  external  data  management  can  be  simplified  at 
the  expense  of  more  complex  cells  in  the  array. 

In  the  computational  arrays  for  the  Discrete  Fourier  Transform  described  next,  the 
mapping  is  made  by  exploiting  symmetry  and  other  special  properties  of  the  data. 

The  mapping  of  large  trees  onto  ensembles  of  a  limited  number  of  processors 
interconnected  in  regular  patterns  is  also  being  studied. 


Formal  description  of  Computational  Networks 

A  formal  description  of  computational  networks  is  being  developed  in  collaboration 
with  Danny  Cohen  at  USC/Information  Sciences  Institute.  The  goal  Is  to  be  able  to 
proceed  in  an  entirely  formal  way  from  a  mathematical  expression  defining  a  function 
to  be  computed  to  a  description  of  a  computational  network  computing  the  desired 
function.  In  the  initial  joint  efforts  the  notation  proposed  earlier  by  Cohen  has  been 
used  to  formally  derive  computational  arrays  for  the  Discrete  Fourier  Transform. 
Arrays  implementing  the  FFT  are  of  particular  interest  since  the  data  flow  is  not 
laminar. 

In  the  notation  used  the  control  can  be  modeled  explicitly.  Explicit  modeling  of  the 
control  Is  necessary  when  the  resources  are  shared  over  time. 


Wire  Routing 

The  building  of  a  small  machine  (16  -  32  Pathfinder  chips)  is  near  completion.  The 
principal  of  analog  control  of  digital  computations,  as  used  in  the  Pathfinder  chip,  is 
being  evaluated.  The  project  is  expected  to  be  completed  within  the  next  few 
months. 
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Testing 

Our  development  of  a  tester  and  a  test  language  have  now  progressed  to  a  stage 
where  both  our  medium  tester  and  the  proposed  test  language  are  being  used  on  ah 
experimental  basis,  A  preliminary  user's  manual  is  available  and  will  be  handed  out 
during  the  testing  session. 


Local  Network 


presently,  we  are  exploring  network  connection  schemes.  We  have  had  a  version  of 
the  Internet  Protocol  running  on  our  nodes  for  some  time.  Since  the  systems  software 
on  the  nodes  provides  event  scheduled  multi-processing,  our  connection  management 
protocols  provide  Inter-network  Inter-process  communications.  To  complete  our  effort 
to  provide  a  network  that  acts  as  a  terminal  concentrator  we  plan  to  make  our 
connection  management  software  more  robust,  for  instance  by  modifying  the  process 
scheduling  software  to  handle  situations  involving  wait  for  multiple  conditions  to 
become  true. 


Earl 

A  new  design  system  Earl,  an  interpretive  "silicon  assembler,"  is  now  sufficiently 
complete  that  it  is  starting  to  be  used  by  a  small  group  of  designers.  Bugs  are  being 
removed  and  new  features  added.  Earl  will  be  available  for  distribution  this  summer. 
If  you  are  interested  in  getting  copies  of  our  VAX  software  package,  including  CLAP 
(a  C  language  based  LAP),  CIF20P  (a  CIF  2.0  plotter),  and  Earl,  send  mall  to 
ChuckeCIT-20. 

Earl  Is  written  In  C  to  run  on  a  VAX  under  Berkeley  Unix.  It  allows  the  definition  of  cell 
geometry  with  constraints,  and  maintains  information  about  the  ports  of  a  cell.  Its 
compositkin  operators  do  constraint  solving  to  fit  cells  or  compositions  of  cells 
together  by  abutment,  stretching  the  geometry  as  required.  Earl  also  includes  an 
interesting  geometrical  primitive  that  allows  one  to  specify  a  wire  path  between 
points  while  "missing"  one  or  more  other  points  clockwise  or  counterclockwise  by  a 
specified  radius.  Earl  thus  allows  not  only  "funny  angles"  but  even  circular  arcs. 


Notations  for  design  of  Concurrent  Systems 

A  theorem  on  deadlock  in  concurrent  computing  environments  has  been  proved.  The 
theorem  states  that  a  system  Is  free  of  deadlock  if  the  number  of  available 
resources  In  contention  plus  the  number  of  processes  competing  for  these  resources 
Is  greater  than  the  total  number  of  resources  requested. 


Self-timed  Systems 

Quite  a  few  new  designs  for  self-timed  elements  In  nMOS  and  CMOS  technology  were 
developed  In  this  period,  and  many  are  currently  queued  up  for  fabrication.  These 
designs  include  self-timed  precharge  PLAs  that  are  denser  than  conventional  PLAs 
even  though  they  generate  completion  signals,  a  CMOS  arbiter,  and  an  interesting 
FIFO/LIFO  structure  (designed  by  an  ARPA  visitor  from  Linkabit,  Or.  Klein  Gilhausen). 


On  the  theoretical  side,  some  progress  was  made  (in  collaboration  with  Susan  Owicki 
at  Stanford)  in  verifying  the  sequencing  properties  of  compositions.  The  sequencing 
rules  for  the  elements  or  systems  being  composed  are  represented  in  a 
sequence-net  (s-net)  notation  —  a  form  of  Petri  net  — ,  and  the  reasoning  about 
composing  the  parts  is  based  on  the  reduction  of  a  merged  s-net. 


Concurrency  Algebra 

Current  research  focus  on  the  problem  posed  by  Petri  Nets  of  infinite  behavior.  Ail 
finite  behaviors  can  be  proved  by  ordinary  induction.  To  show  two  infinite  strings  to 
be  identical  requires  transfinlte  induction,  which  requires  the  behavior  of  Petri  Nets 
to  have  some  "compactness"  property. 


Logic  for  Program  Verification 

Logic  for  lambda  calculus  programs  is  being  studied.  Lattice  models  as  suggested  by 
Dana  Scott  are  used.  The  models  do  not  define  a  Boolean  algebra,  but  a 
pseudo-Boolean  or  Heyting  algebra.  Using  a  new  logic,  LAMBDA-LOGIC,  a  formalization 
of  program  verification  can  be  made. 


Language  Design 

Our  effort  to  design  a  unification  chip  Is  now  completed.  The  chip  represents  a 
hardware  implementation  of  the  unification  program  by  J.  A.  Robinson.  The  UNIF-CHIP 
unifies  a  pair  of  expressions  and  computes  the  substitution  set  for  this  pair  of 
expressions.  The  result  of  a  successful  unification  Is  stored  off  chip  in  a  RAM.  The 
chip  consists  of  three  parts:  controller,  data  register,  and  a  stack  memory.  The  size 
of  the  chip  is  approximately  3000  x  4500  lambda. 


